73 research outputs found
Comparing the hierarchy of author given tags and repository given tags in a large document archive
Folksonomies - large databases arising from collaborative tagging of items by
independent users - are becoming an increasingly important way of categorizing
information. In these systems users can tag items with free words, resulting in
a tripartite item-tag-user network. Although there are no prescribed relations
between tags, the way users think about the different categories presumably has
some built in hierarchy, in which more special concepts are descendants of some
more general categories. Several applications would benefit from the knowledge
of this hierarchy. Here we apply a recent method to check the differences and
similarities of hierarchies resulting from tags given by independent
individuals and from tags given by a centrally managed repository system. The
results from out method showed substantial differences between the lower part
of the hierarchies, and in contrast, a relatively high similarity at the top of
the hierarchies.Comment: 10 page
Extracting tag hierarchies
Tagging items with descriptive annotations or keywords is a very natural way
to compress and highlight information about the properties of the given entity.
Over the years several methods have been proposed for extracting a hierarchy
between the tags for systems with a "flat", egalitarian organization of the
tags, which is very common when the tags correspond to free words given by
numerous independent people. Here we present a complete framework for automated
tag hierarchy extraction based on tag occurrence statistics. Along with
proposing new algorithms, we are also introducing different quality measures
enabling the detailed comparison of competing approaches from different
aspects. Furthermore, we set up a synthetic, computer generated benchmark
providing a versatile tool for testing, with a couple of tunable parameters
capable of generating a wide range of test beds. Beside the computer generated
input we also use real data in our studies, including a biological example with
a pre-defined hierarchy between the tags. The encouraging similarity between
the pre-defined and reconstructed hierarchy, as well as the seemingly
meaningful hierarchies obtained for other real systems indicate that tag
hierarchy extraction is a very promising direction for further research with a
great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure
Detecting and classifying lesions in mammograms with Deep Learning
In the last two decades Computer Aided Diagnostics (CAD) systems were
developed to help radiologists analyze screening mammograms. The benefits of
current CAD technologies appear to be contradictory and they should be improved
to be ultimately considered useful. Since 2012 deep convolutional neural
networks (CNN) have been a tremendous success in image recognition, reaching
human performance. These methods have greatly surpassed the traditional
approaches, which are similar to currently used CAD solutions. Deep CNN-s have
the potential to revolutionize medical image analysis. We propose a CAD system
based on one of the most successful object detection frameworks, Faster R-CNN.
The system detects and classifies malignant or benign lesions on a mammogram
without any human intervention. The proposed method sets the state of the art
classification performance on the public INbreast database, AUC = 0.95 . The
approach described here has achieved the 2nd place in the Digital Mammography
DREAM Challenge with AUC = 0.85 . When used as a detector, the system reaches
high sensitivity with very few false positive marks per image on the INbreast
dataset. Source code, the trained model and an OsiriX plugin are availaible
online at https://github.com/riblidezso/frcnn_cad
Komplex hálózatok szerkezete és dinamikája = Structure and dynamics of complex networks
A komplex rendszerek tanulmányozásának jelenleg legsikeresebb eszköze a hálĂłzati megközelĂtĂ©s. Az elmĂ©leti leĂrás kereteit tágĂtottuk azzal, hogy fogalmakat általánosĂtottunk a sĂşlyozott hálĂłzatok esetĂ©re, rĂ©szletesen elemeztĂĽk a modulok meghatározásához használt algoritmusokat, Ăşj mĂłdszert dolgoztunk ki, valamint elemeztĂĽk az eljárások korlátait. A tĹ‘zsdei adatok pĂ©ldáján a korreláciĂłs mátrix hatĂ©kony zajmentesĂtĂ©si lehetĹ‘sĂ©geit taulmányoztuk. KommunikáciĂłs adatok elemzĂ©sĂ©vel elĹ‘ször sikerĂĽlt a szociális hálĂłzatra vonatkozĂł Granovetter-hipotĂ©zist, (""a gyenge kötĂ©sek ereje"") társadalmi mĂ©retekben igazolni, Ă©s ennek alapján működĹ‘ modellt konstruálni. A hálĂłzatokon zajlĂł dinamikai jelensĂ©gek közĂĽl a terjedĂ©s az egyik legfontosabb. Vizsgáltuk, hogyan hat a topolĂłgia Ă©s az Ă©lsĂşlyok kapcsolata az ilyen jelensĂ©gekre Ă©s mi a katasztrofális kaszkádok mechanizmusa. BebizonyĂtottuk, hogy az emberi viselkedĂ©s rendkĂvĂĽl inhomogĂ©n jellege lĂ©nyegesen befolyásolja az informáciĂłterjedĂ©s sebessĂ©gĂ©t. VizsgálatainkbĂłl azt a következtetĂ©st lehet levonni, hogy annak ellenĂ©re, hogy nagyon kĂĽlönbözĹ‘ hálĂłzatok meglepĹ‘en hasonlĂł sajátosságokat mutathatnak, működĂ©si szempontbĂłl igen eltĂ©rĹ‘ optimalizáciĂłs elveknek felelnek meg. VĂ©gĂĽl megmutattuk, hogy a komplex hálĂłzatokon, de általában a komplex rendszerekben lezajlĂł dinamika általánosan mutatja a fluktuáciĂłs skálázást, elemeztĂĽk ennek lehetsĂ©ges okait, valamint az egyszerű skálázáson tĂşlmutatĂł jelensĂ©geket. | The network approach is presently the most efficient tool to study complex systems. We broadened the framework of theoretical description by generalizing concepts to the case of weighted networks, analyzing in detail community detection algorithms, constructing a new detection method and analyzed the limitations of the procedures. On the example of stock market data we studied the possibilities of denoising efficiently the correlation matrix. Using communication data we proved for the first time on a societal scale the Granovetter hypothesis (""The strength of weak ties"") on the social network. One of the most important dynamic phenomena on networks is that of spreading. We investigated how the topology and its relation to the link weights affect such phenomena and what is the mechanism of catastrophic cascades. We proved that the inhomogeneous, bursty character of human behavior substantially influences the speed of spreading of information. We can conclude from our investigations that in spite of the fact that very different networks may show surprisingly similar properties, they obey very different optimization principles from the point of view of their functioning. Finally, we showed that dynamics in complex networks but in complex systems in general shows fluctuation scaling, we analyzed the possible origins and the phenomena, which go beyond simple scaling
Ontologies and tag-statistics
Due to the increasing popularity of collaborative tagging systems, the
research on tagged networks, hypergraphs, ontologies, folksonomies and other
related concepts is becoming an important interdisciplinary topic with great
actuality and relevance for practical applications. In most collaborative
tagging systems the tagging by the users is completely "flat", while in some
cases they are allowed to define a shallow hierarchy for their own tags.
However, usually no overall hierarchical organisation of the tags is given, and
one of the interesting challenges of this area is to provide an algorithm
generating the ontology of the tags from the available data. In contrast, there
are also other type of tagged networks available for research, where the tags
are already organised into a directed acyclic graph (DAG), encapsulating the
"is a sub-category of" type of hierarchy between each other. In this paper we
study how this DAG affects the statistical distribution of tags on the nodes
marked by the tags in various real networks. We analyse the relation between
the tag-frequency and the position of the tag in the DAG in two large
sub-networks of the English Wikipedia and a protein-protein interaction
network. We also study the tag co-occurrence statistics by introducing a 2d
tag-distance distribution preserving both the difference in the levels and the
absolute distance in the DAG for the co-occurring pairs of tags. Our most
interesting finding is that the local relevance of tags in the DAG, (i.e.,
their rank or significance as characterised by, e.g., the length of the
branches starting from them) is much more important than their global distance
from the root. Furthermore, we also introduce a simple tagging model based on
random walks on the DAG, capable of reproducing the main statistical features
of tag co-occurrence.Comment: Submitted to New Journal of Physic
Komplex Hálózatok Moduláris Szerkezete = Modular Structure of Complex Networks
Kidolgoztunk egy mĂłdszert, mely lehetĹ‘vĂ© teszi idĹ‘ben változĂł hálĂłzatokban a csoportok nyomon követĂ©sĂ©t. A csoportok idĹ‘fejlĹ‘dĂ©sĂ©t nagymĂ©retű társaskapcsolat hálĂłzatokban vizsgáltuk Ă©s több Ă©rdekes összefĂĽggĂ©st találtunk a csoportok mĂ©rete, idĹ‘beli változĂ©konysága Ă©s fennmaradási valĂłszĂnűsĂ©ge között. KiterjesztettĂĽk a klikk perkoláciĂłs mĂłdszert irányĂtott- Ă©s sĂşlyozott hálĂłzatokra. Ezek segĂtsĂ©gĂ©vel számos nagymĂ©retű valĂłs hálĂłzatot vizsgáltunk. Az irányĂtott csoportosulások viselkedĂ©se kĂ©t nagy osztályba sorolta a vizsgált rendszereket, a sĂşlyozott hálĂłzatoknál pedig Ă©rdekes Ă©lsĂşlyok korreláciĂłkat fedtĂĽnk fel. A mikroRNS-ek Ă©s az általuk gátolt mRNS-ek hálĂłzatát vizsgálva a klikk perkoláciĂłs mĂłdszer segĂtsĂ©gĂ©vel mikroRNS funkciĂłs csoportokat sikerĂĽlt beazonosĂtani, Ă©s a sejten belĂĽli jelátviteli hálĂłzatokban gyĂłgyszer cĂ©lpont fehĂ©rjĂ©k elĹ‘rejelzĂ©sĂ©hez fejlesztettĂĽnk bioinformatikai mĂłdszereket. A hálĂłzati hierarchiához kapcsolĂłdĂłan cĂmkĂ©zett hálĂłzatok statisztikai tulajdonságait vizsgálatuk olyan rendszerekben, ahol a cĂmkĂ©k maguk is hierarchikusan szervezĹ‘dnek. EredmĂ©nyeink szerint a tanulmányozott hálĂłzatok Ă©rdekes önhasonlĂłságot mutatnak a cĂmke indukált rĂ©szgráfokra törtĂ©nĹ‘ leszűkĂtĂ©s esetĂ©n. A hierarchia tanulmányozásához kapcsolĂłdĂłan kifejlesztettĂĽnk egy önhasonlĂł, hierarchikus multifraktál Ă©lbekötĂ©si mĂ©rtĂ©ken alapulĂł vĂ©letlen gráf generálĂł mĂłdszert. Megmutattuk, hogy ennek segĂtsĂ©gĂ©vel nagyon sokfĂ©le eltĂ©rĹ‘ vĂ©letlen hálĂłzat generálhatĂł le. | We developed a method enabling the tracking of communities in time evolving networks. We studied the statistical properties of community evolution in large social networks, and revealed interesting non trivial relations between the size, stationarity and survival probability of communities. We extended the clique percolation method for handling directed- and weighted networks, and analyzed numerous real networks with these new algorithms. The behavior of the directed communities classified the examined systems into two major groups, whereas the studies of the weighted networks revealed interesting link weight correlations. We located functional units with the help of the clique percolation method in the network of microRNAs and their regulated mRNAs, and developed bioinformatical tools for signal transduction networks, helping the prediction of drug target proteins. Relating to the field of network hierarchy, we studied the statistical features of tagged networks where the tags were hierarchically organized. According to our results, the examined networks showed an interesting self similarity when restricted to the tag-induced sub-graphs. Relating to the studies of hierarchy, we developed a random graph generator based on self-similar, hierarchical multifractal link probability measure. We have shown, that this method is capable of generating random networks with very diverse properties
- …